Although open carry is not directly banned, New York City prohibits the possession of a “loaded” handgun outside of the home or place of business without a carry license. We usually think NYC are more strict on guns than some states and cities, but for the geographic location and political reasons, gun situation in NYC is much more complicated.
New York City has been the site of many Black Lives Matter protests in response to incidents of police brutality and racially motivated violence against black people, especially during George Floyd protests (May–June 2020).
The political issues have a great influence on gun violence as well, for example the change of mayor and changes of crime police.
Besides, COVID-19 literally changed lifestyle of New Yorkers, which contain the frequency and distribution of shooting cases in NYC.
We have mainly used two datasets – NYPD Shooting Incident and COVID-19 Daily Counts of Cases, Hospitalizations, and Deaths, both are from the NYC open data.
NYPD Shooting Incident: Contains information about the incident number, occurrence date, time, the demographic characteristics of the perpetrator and victims geographic location, etc. of the shooting cases happen in NYC during year 2006 to 2021, and was extracted each quarter by the Office of Management Analysis and Planning. It could serve as a great source for government and police to understand some potential nature behind shooting incidents. Each row represents an unique victim, the same incident key may occur several times, meaning the incident may have multiple victims. There are many values missing for perp_age_group, perp_sex and perp_race, which is due to the fact that the information of the perpetrator was unknown or not available at the time of collection. Key variables used in our analysis are:
incident_keyoccur_date, occur_timeincident_keystatistical_murder_flagperp_age_group, perp_sex, perp_racevic_age_group, vic_sex, vic_racelatitude, longitudeCOVID-19 Daily Counts of Cases, Hospitalizations, and Deaths: Contains the COVID-related hospitalizations and confirmed, probable death among the whole NYC and each borough. Managed by the NYC Department of Health and Mental Hygiene Incident Command System for COVID-19 Response and is updated everyday. We clean the data by renaming and changing the types of some variables. In addition, we have added a variable borough to show the borough information more explicitly. Key variables are:
month, day, yearboroughborough_case_counttotal_case_countlibrary(tidyverse)
library(rvest)
library(knitr)
library(leaflet)
library(rgdal)
library(lubridate)
library(plotly)
library(modelr)
col1 = "#d8e1cf"
col2 = "#438484"
theme_set(theme_minimal() + theme(legend.position = "bottom"))
options(
ggplot2.continuous.colour = "viridis",
ggplot2.continuous.fill = "viridis"
)
scale_colour_discrete = scale_colour_viridis_d
scale_fill_discrete = scale_fill_viridis_d
Load the historic and year-to-datedatasets of NYPD shooting incident
shooting_initial =
read_csv("./data/NYPD_Shooting.csv") %>% janitor::clean_names()
shooting_2021 = read_csv("./data/NYPD_shooting_New.csv") %>% janitor::clean_names()
#A variable name in shooting_new is different from the initial data, change column name in order to merge the data frames
shooting_2021 = shooting_2021 %>%
rename(lon_lat = new_georeferenced_column)
shooting = rbind(shooting_initial, shooting_2021)
check null
shooting %>%
summarise_all(~ sum(is.na(.)))
## # A tibble: 1 × 19
## incident_key occur_date occur_time boro precinct jurisdiction_code
## <int> <int> <int> <int> <int> <int>
## 1 0 0 0 0 0 3
## # … with 13 more variables: location_desc <int>, statistical_murder_flag <int>,
## # perp_age_group <int>, perp_sex <int>, perp_race <int>, vic_age_group <int>,
## # vic_sex <int>, vic_race <int>, x_coord_cd <int>, y_coord_cd <int>,
## # latitude <int>, longitude <int>, lon_lat <int>
For col boro
shooting = shooting %>%
mutate(boro = as.factor(boro)) %>%
mutate(location_desc = replace_na(location_desc, "NONE")) %>%
mutate(location_desc = as.factor(location_desc)) %>%
separate(occur_date, into = c("month", "day", "year")) %>%
mutate(month = as.numeric(month)) %>%
arrange(year, month) %>%
mutate(year = as.character(year)) %>%
mutate(boro = tolower(boro)) %>%
mutate(boro = if_else(boro == "staten island", "staten_island", boro)) %>%
rename(borough = boro) %>%
mutate(date = str_c(month, day, year, sep = "/")) %>%
select(incident_key, date, everything())
Next, clean the COVID-19 case count data
Importing COVID-19 case count data
covid_counts = read.csv("./data/COVID19_data.csv", sep = ";") %>% as_tibble()
The clean dataset contains only day-by-day COVID-19 case count for each borough and the total case count in NYC of a particular day.
clean_covid = covid_counts %>%
janitor::clean_names() %>%
rename(date = date_of_interest) %>%
select(date, contains("case_count")) %>%
select(-contains(c("probable_case_count", "case_count_7day_avg", "all_case_count_7day_avg"))) %>%
separate(date, into = c("month", "day", "year")) %>%
mutate_all(as.character) %>%
mutate_if(is.character, gsub, pattern = ",", replacement = "") %>%
mutate_if(is.character, as.numeric) %>%
pivot_longer(
cols = bx_case_count:si_case_count,
names_to = "borough",
values_to = "borough_case_count"
) %>%
mutate(borough = gsub("_case_count", "", borough)) %>%
mutate(borough = recode(borough, "bx" = "bronx","bk" = "brooklyn","mn" = "manhattan","si" = "staten_island","qn" = "queens")) %>%
relocate(case_count, .after = borough_case_count) %>%
rename(total_case_count = case_count) %>%
mutate(date = str_c(month, day, year, sep = "/")) %>%
select(date, everything())
shooting_heatmap = shooting_initial %>%
mutate(occur_date = as.Date(occur_date,'%m/%d/%Y')) %>%
mutate(occur_date = weekdays(occur_date)) %>%
separate(occur_time, into = c("hour", "minute", "second")) %>%
mutate(hour = as.factor(hour)) %>%
select(incident_key, occur_date, hour) %>%
mutate(occur_date = as.factor(occur_date),
occur_date = fct_relevel(occur_date, "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday" , "Saturday"))
dayHour = plyr::ddply(shooting_heatmap, c( "hour", "occur_date"), summarise, N = length(incident_key))
attach(dayHour)
heatmap = ggplot(dayHour, aes(hour, occur_date)) +
geom_tile(aes(fill = N),colour = "white", na.rm = TRUE) +
scale_fill_gradient(low = col1, high = col2) +
guides(fill = guide_legend(title = "Total Shooting Cases")) +
theme_bw() +
theme_minimal() +
labs(title = "Time Based Heatmap",
x = "Shooting Cases Per Hour", y = "Day of Week") +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
heatmap

According to the result of this heatmap, the midnight of weekends(Sunday and Saturday) have the highest risk of shooting cases. Additionally, daytime between 7 in the morning and 19 in the evening seems to have lower shooting cases than the other time of the day.
Is this situation happened in every boro?
In Brooklyn
heatmap_bn = shooting_initial %>%
filter(boro == "BROOKLYN") %>%
mutate(occur_date = as.Date(occur_date,'%m/%d/%Y')) %>%
mutate(occur_date = weekdays(occur_date)) %>%
separate(occur_time, into = c("hour", "minute", "second")) %>%
mutate(hour = as.factor(hour)) %>%
select(incident_key, occur_date, hour) %>%
mutate(occur_date = as.factor(occur_date),
occur_date = fct_relevel(occur_date, "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday" , "Saturday"))
dayHour = plyr::ddply(heatmap_bn, c( "hour", "occur_date"), summarise, N = length(incident_key))
attach(dayHour)
heatmap_Bn = ggplot(dayHour, aes(hour, occur_date)) +
geom_tile(aes(fill = N),colour = "white", na.rm = TRUE) +
scale_fill_gradient(low = col1, high = col2) +
guides(fill = guide_legend(title = "Total Shooting Cases")) +
theme_bw() +
theme_minimal() +
labs(title = "Time Based Heatmap in Brooklyn",
x = "Shooting Cases Per Hour", y = "Day of Week") +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
heatmap_Bn
In Bronx
heatmap_bx = shooting_initial %>%
filter(boro == "BRONX") %>%
mutate(occur_date = as.Date(occur_date,'%m/%d/%Y')) %>%
mutate(occur_date = weekdays(occur_date)) %>%
separate(occur_time, into = c("hour", "minute", "second")) %>%
mutate(hour = as.factor(hour)) %>%
select(incident_key, occur_date, hour) %>%
mutate(occur_date = as.factor(occur_date),
occur_date = fct_relevel(occur_date, "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday" , "Saturday"))
dayHour = plyr::ddply(heatmap_bx, c( "hour", "occur_date"), summarise, N = length(incident_key))
attach(dayHour)
heatmap_Bx = ggplot(dayHour, aes(hour, occur_date)) +
geom_tile(aes(fill = N),colour = "white", na.rm = TRUE) +
scale_fill_gradient(low = col1, high = col2) +
guides(fill = guide_legend(title = "Total Shooting Cases")) +
theme_bw() +
theme_minimal() +
labs(title = "Time Based Heatmap in Bronx",
x = "Shooting Cases Per Hour", y = "Day of Week") +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
heatmap_Bx

In Queens
heatmap_q = shooting_initial %>%
filter(boro == "QUEENS") %>%
mutate(occur_date = as.Date(occur_date,'%m/%d/%Y')) %>%
mutate(occur_date = weekdays(occur_date)) %>%
separate(occur_time, into = c("hour", "minute", "second")) %>%
mutate(hour = as.factor(hour)) %>%
select(incident_key, occur_date, hour) %>%
mutate(occur_date = as.factor(occur_date),
occur_date = fct_relevel(occur_date, "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday" , "Saturday"))
dayHour = plyr::ddply(heatmap_q, c( "hour", "occur_date"), summarise, N = length(incident_key))
attach(dayHour)
heatmap_q = ggplot(dayHour, aes(hour, occur_date)) +
geom_tile(aes(fill = N),colour = "white", na.rm = TRUE) +
scale_fill_gradient(low = col1, high = col2) +
guides(fill = guide_legend(title = "Total Shooting Cases")) +
theme_bw() +
theme_minimal() +
labs(title = "Time Based Heatmap in Queens",
x = "Shooting Cases Per Hour", y = "Day of Week") +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
heatmap_q

In Manhattan
heatmap_m = shooting_initial %>%
filter(boro == "MANHATTAN") %>%
mutate(occur_date = as.Date(occur_date,'%m/%d/%Y')) %>%
mutate(occur_date = weekdays(occur_date)) %>%
separate(occur_time, into = c("hour", "minute", "second")) %>%
mutate(hour = as.factor(hour)) %>%
select(incident_key, occur_date, hour) %>%
mutate(occur_date = as.factor(occur_date),
occur_date = fct_relevel(occur_date, "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday" , "Saturday"))
dayHour = plyr::ddply(heatmap_m, c( "hour", "occur_date"), summarise, N = length(incident_key))
attach(dayHour)
heatmap_m = ggplot(dayHour, aes(hour, occur_date)) +
geom_tile(aes(fill = N),colour = "white", na.rm = TRUE) +
scale_fill_gradient(low = col1, high = col2) +
guides(fill = guide_legend(title = "Total Shooting Cases")) +
theme_bw() +
theme_minimal() +
labs(title = "Time Based Heatmap in Manhattan",
x = "Shooting Cases Per Hour", y = "Day of Week") +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
heatmap_m

In Staten Island
heatmap_l = shooting_initial %>%
filter(boro == "STATEN ISLAND") %>%
mutate(occur_date = as.Date(occur_date,'%m/%d/%Y')) %>%
mutate(occur_date = weekdays(occur_date)) %>%
separate(occur_time, into = c("hour", "minute", "second")) %>%
mutate(hour = as.factor(hour)) %>%
select(incident_key, occur_date, hour) %>%
mutate(occur_date = as.factor(occur_date),
occur_date = fct_relevel(occur_date, "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday" , "Saturday"))
dayHour = plyr::ddply(heatmap_l, c( "hour", "occur_date"), summarise, N = length(incident_key))
attach(dayHour)
heatmap_l = ggplot(dayHour, aes(hour, occur_date)) +
geom_tile(aes(fill = N),colour = "white", na.rm = TRUE) +
scale_fill_gradient(low = col1, high = col2) +
guides(fill = guide_legend(title = "Total Shooting Cases")) +
theme_bw() +
theme_minimal() +
labs(title = "Time Based Heatmap in Staten Island",
x = "Shooting Cases Per Hour", y = "Day of Week") +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
heatmap_l

For the shooting incidents across time, we use three different levels to analyze. Firstly, we compared shooting case year by year. ### Distribution of Shooting Case of Years
shooting_year = shooting %>%
group_by(year) %>%
summarise(n_obs = n())
#visualization shooting incidence trend
shooting_year %>%
plot_ly( x = ~year, y = ~n_obs, type = "scatter", mode = "lines+markers") %>%
layout(title = "Shooting Incidence Trend from 2006 to 2021",
xaxis = list(title = "Year"),
yaxis = list(title = "Frequency"))
By observing the data set, the shooting incidence gradually decrease from 2055 cases in 2006 to 967 cases in 2019. However, due to the Covid-19 pandemic and responses to large-scale protests over the killing of George Floyd, there is a sharply surge of shooting incidents in 2020 which have 1948 cases. Since the data for 2021 is only from January to September 30th, we are not sure whether there is a decrease in the year 2021 compared to 2020.
Then we take a look of average shooting cases between months from 2006 to 2021 in the New York City.
shooting_month = shooting %>%
mutate(month = as.factor(month)) %>%
group_by(month) %>%
summarise(n_obs = n())
shooting_month %>%
plot_ly(x = ~month, y = ~n_obs, color = ~month, type = "bar") %>%
layout( title = "The Distribution of Shooting Incidence by Month",
xaxis = list(title = "Month"),
yaxis = list(title = "Frequency"))
The distribution of the shooting incidence by month has a bell-shape. The shooting case concentrated in summer from May to September. The reason of this may related to the time of memorial day and the labor day.
shooting_time = shooting %>%
##format the occur_time variable to only hours.
mutate(occur_time_hour = format(as.POSIXct(occur_time), format = "%H")) %>%
mutate(occur_time_hour = as.numeric(occur_time_hour)) %>%
group_by(occur_time_hour) %>%
summarise(case_numb = n())
#divide day time to 4 groups: 0-6;6-12;12-18;18-24
shooting_time = shooting_time %>%
mutate(occur_time_range = case_when(
occur_time_hour >= 0 & occur_time_hour < 6 ~ "0-6",
occur_time_hour >= 6 & occur_time_hour < 12 ~ "6-12",
occur_time_hour >= 12 & occur_time_hour < 18 ~ "12-18",
occur_time_hour >= 18 & occur_time_hour < 24 ~ "18-24"))
shooting_time = shooting_time %>%
mutate(occur_time_range = factor(occur_time_range, levels = c("0-6","6-12","12-18","18-24")))
ggplot(shooting_time, aes(x = occur_time_range, y = case_numb, fill = occur_time_range)) + geom_col(alpha = 1) + labs(x = "Occur Time Range",
y = "Frequency",
title = "Distribution of Shooting Case by Day")

#pie chart of ratio of shooting cases in a day
ggplot(shooting_time, aes(x = "", y = case_numb, fill = occur_time_range)) +
geom_bar(width = 1, stat = "identity") +
coord_polar("y", start = 0) +
scale_fill_brewer(palette = "Pastel1") +
theme_void() +
guides(fill = guide_legend(title = "occur Time range")) +
labs(title = "Pie Chart for Distribution of Shooting Case by Day") +
theme(legend.position = "right")

From the bar chart, it is obvious that most of the shooting cases happens in the evening and the late night, which concentrated in 18-24 and 0-6. The pie chart clearly show the occupy of shooting cases time range take place in a day.
We would like to focus the gun violence in the year of 2020 as a critical year of surge. (Since the covid-19 starts from March 2020, to see if there any relation between shooting and Covid.)
shooting_2020 = shooting %>%
filter(year == "2020") %>%
mutate(month = as.factor(month)) %>%
group_by(month) %>%
summarise(n_obs = n())
ggplot(shooting_2020, aes(x = month, y = n_obs, fill = month)) + geom_col(alpha = 1) + labs(
x = "Month",
y = "Frequency",
title = "Distribution of Shooting Case across month in 2020 in NYC")
The major rise in gun violence in the city began in 2020, after a period in which violent crime dropped to its lowest levels in more than six decades. For the first half of the year, Gun violence is relatively low as the city shut down early in the pandemic. The spike of shooting cases during June to August, which is mainly because of the death of George Floyd.
Since 2020 is the critical year, we would like to analyze the average shooting case by month between year 2019 and 2020.
shooting_2019_2020 = shooting %>%
filter(year == "2019" | year == "2020") %>%
mutate(month = as.factor(month)) %>%
group_by(year, month) %>%
summarise(frequency = n())
ggplot(shooting_2019_2020, aes(x = month, y = frequency, fill = year)) + geom_bar(stat = "identity", position = position_dodge(), alpha = 0.75) + labs(
x = "Month",
y = "Frequency",
title = "Shooting Case Across Month in NYC in 2019 & 2020")
According to the plot, besides there is year on year to growth between 2019 to 2020, the distribution of shooting case across month are the same.
shooting_map = shooting %>%
mutate_at(c("perp_age_group", "perp_sex", "perp_race"), funs(ifelse(is.na(.), "UNKNOWN", .))) %>%
mutate(labels = str_c("<b>Incident Key: </b>", incident_key,
"<br>", "<b>Date: </b>", date,
"<br>", "<b>Borough: </b>", borough,
"<br>", "<b>Murdered: </b>", statistical_murder_flag,
"<br>", "<b>Perpetrator's Race: </b>", perp_race,
"<br>", "<b>Victim's Race: </b>", vic_race,
"<br>", "<b>Perpetrator's Age: </b>", perp_age_group,
"<br>", "<b>Victim's Age: </b>", vic_age_group
))
nyc_boro = readOGR("./data/Borough_Boundaries/geo_export_2204bc6b-9c17-46ed-8a67-7245a1e15877.shp", layer = "geo_export_2204bc6b-9c17-46ed-8a67-7245a1e15877", verbose = FALSE)
polygon_color <- colorFactor(
palette = "viridis",
domain = as.factor(nyc_boro@data$boro_name))
shooting_map %>%
leaflet() %>%
addTiles() %>%
addProviderTiles("CartoDB.Positron") %>%
addMarkers(lng = ~longitude, lat = ~latitude, popup = ~labels,
clusterOptions = markerClusterOptions()) %>%
addPolygons(data = nyc_boro,
weight = 0.85,
fillColor = ~polygon_color(nyc_boro@data$boro_name),
# fillOpacity = 0.6,
color = "#BDBDC3",
label = ~nyc_boro@data$boro_name)